A unidirectional imager would only permit image formation along one direction, from an input field-of-view (FOV) A to an output FOV B, and in the reverse path, the image formation would be blocked. Here, we report the first demonstration of unidirectional imagers, presenting polarization-insensitive and broadband unidirectional imaging based on successive diffractive layers that are linear and isotropic. These diffractive layers are optimized using deep learning and consist of hundreds of thousands of diffractive phase features, which collectively modulate the incoming fields and project an intensity image of the input onto an output FOV, while blocking the image formation in the reverse direction. After their deep learning-based training, the resulting diffractive layers are fabricated to form a unidirectional imager. As a reciprocal device, the diffractive unidirectional imager has asymmetric mode processing capabilities in the forward and backward directions, where the optical modes from B to A are selectively guided/scattered to miss the output FOV, whereas for the forward direction such modal losses are minimized, yielding an ideal imaging system between the input and output FOVs. Although trained using monochromatic illumination, the diffractive unidirectional imager maintains its functionality over a large spectral band and works under broadband illumination. We experimentally validated this unidirectional imager using terahertz radiation, very well matching our numerical results. Using the same deep learning-based design strategy, we also created a wavelength-selective unidirectional imager, where two unidirectional imaging operations, in reverse directions, are multiplexed through different illumination wavelengths. Diffractive unidirectional imaging using structured materials will have numerous applications in e.g., security, defense, telecommunications and privacy protection.
translated by 谷歌翻译
尽管在预验证的GAN模型的潜在空间中表现出的编辑能力,但倒置现实世界的图像被陷入困境,即重建不能忠于原始输入。这样做的主要原因是,训练和现实世界数据之间的分布未对准,因此,对于真实图像编辑而言,它不稳定。在本文中,我们提出了一个基于GAN的新型编辑框架,以通过组成分解范式解决室外反转问题。特别是,在构图阶段,我们引入了一个差分激活模块,用于从全局角度\ ie(IE)检测语义变化,这是编辑和未编辑图像的特征之间的相对差距。借助生成的diff-cam掩模,配对的原始图像和编辑图像可以直观地进行粗糙的重建。这样,几乎整体可以生存属性,而这种中间结果的质量仍然受到不可避免的幽灵效果的限制。因此,在分解阶段,我们进一步提出了一个基于GAN的基于GAN的DEGHOSTING网络,用于将最终的精细编辑图像与粗糙重建分开。在定性和定量评估方面,广泛的实验比最新方法具有优势。我们方法的鲁棒性和灵活性在两个属性和多属性操作的方案上也得到了验证。
translated by 谷歌翻译
在这项工作中,我们在分配强化学习方面建立了最新的进步,以基于IQN提供模型的最新分配变体。我们通过使用GAN模型的生成器和鉴别器功能与分位数回归来实现这一目标,从而近似于状态返回分布的完整分位数。我们证明了基线数据集的性能提高-57 Atari 2600游戏。此外,我们使用算法来显示Atari游戏中风险敏感政策的最新培训表现,并通过政策优化和评估。
translated by 谷歌翻译
We present the interpretable meta neural ordinary differential equation (iMODE) method to rapidly learn generalizable (i.e., not parameter-specific) dynamics from trajectories of multiple dynamical systems that vary in their physical parameters. The iMODE method learns meta-knowledge, the functional variations of the force field of dynamical system instances without knowing the physical parameters, by adopting a bi-level optimization framework: an outer level capturing the common force field form among studied dynamical system instances and an inner level adapting to individual system instances. A priori physical knowledge can be conveniently embedded in the neural network architecture as inductive bias, such as conservative force field and Euclidean symmetry. With the learned meta-knowledge, iMODE can model an unseen system within seconds, and inversely reveal knowledge on the physical parameters of a system, or as a Neural Gauge to "measure" the physical parameters of an unseen system with observed trajectories. We test the validity of the iMODE method on bistable, double pendulum, Van der Pol, Slinky, and reaction-diffusion systems.
translated by 谷歌翻译
Existing solutions to network scheduling typically assume that the instantaneous link rates are completely known before a scheduling decision is made or consider a bandit setting where the accurate link quality is discovered only after it has been used for data transmission. In practice, the decision maker can obtain (relatively accurate) channel information, e.g., through beamforming in mmWave networks, right before data transmission. However, frequent beamforming incurs a formidable overhead in densely deployed mmWave WLANs. In this paper, we consider the important problem of throughput optimization with joint link probing and scheduling. The problem is challenging even when the link rate distributions are pre-known (the offline setting) due to the necessity of balancing the information gains from probing and the cost of reducing the data transmission opportunity. We develop an approximation algorithm with guaranteed performance when the probing decision is non-adaptive, and a dynamic programming based solution for the more challenging adaptive setting. We further extend our solutions to the online setting with unknown link rate distributions and develop a contextual-bandit based algorithm and derive its regret bound. Numerical results using data traces collected from real-world mmWave deployments demonstrate the efficiency of our solutions.
translated by 谷歌翻译
To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets and further incorporates $45$ PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox.
translated by 谷歌翻译
Convolution neural networks (CNNs) have achieved remarkable success, but typically accompany high computation cost and numerous redundant weight parameters. To reduce the FLOPs, structure pruning is a popular approach to remove the entire hidden structures via introducing coarse-grained sparsity. Meanwhile, plentiful pruning works leverage fine-grained sparsity instead (sparsity are randomly distributed), whereas their sparse models lack special designed computing library for potential speedup. In this technical report, we study and present an efficient convolution neural network inference system to accelerate its forward pass by utilizing the fine-grained sparsity of compressed CNNs. Our developed FSCNN is established based on a set of specialized designed sparse data structures, operators and associated algorithms. Experimentally, we validate that FSCNN outperforms standard deep learning library PyTorch on popular CNN architectures such as VGG16 if sufficiently high sparsity exhibits. However, due to the contiguity issue of sparse operators, FSCNN is typically not comparable with highly optimized dense operator. Therefore, coarse-grained (structured) sparsity is our recommendation for generic model compression.
translated by 谷歌翻译
Ensemble learning serves as a straightforward way to improve the performance of almost any machine learning algorithm. Existing deep ensemble methods usually naively train many different models and then aggregate their predictions. This is not optimal in our view from two aspects: i) Naively training multiple models adds much more computational burden, especially in the deep learning era; ii) Purely optimizing each base model without considering their interactions limits the diversity of ensemble and performance gains. We tackle these issues by proposing deep negative correlation classification (DNCC), in which the accuracy and diversity trade-off is systematically controlled by decomposing the loss function seamlessly into individual accuracy and the correlation between individual models and the ensemble. DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated. Thanks to the optimized diversities, DNCC works well even when utilizing a shared network backbone, which significantly improves its efficiency when compared with most existing ensemble systems. Extensive experiments on multiple benchmark datasets and network structures demonstrate the superiority of the proposed method.
translated by 谷歌翻译
Transportation mode classification, the process of predicting the class labels of moving objects transportation modes, has been widely applied to a variety of real world applications, such as traffic management, urban computing, and behavior study. However, existing studies of transportation mode classification typically extract the explicit features of trajectory data but fail to capture the implicit features that affect the classification performance. In addition, most of the existing studies also prefer to apply RNN-based models to embed trajectories, which is only suitable for classifying small-scale data. To tackle the above challenges, we propose an effective and scalable framework for transportation mode classification over GPS trajectories, abbreviated Estimator. Estimator is established on a developed CNN-TCN architecture, which is capable of leveraging the spatial and temporal hidden features of trajectories to achieve high effectiveness and efficiency. Estimator partitions the entire traffic space into disjointed spatial regions according to traffic conditions, which enhances the scalability significantly and thus enables parallel transportation classification. Extensive experiments using eight public real-life datasets offer evidence that Estimator i) achieves superior model effectiveness (i.e., 99% Accuracy and 0.98 F1-score), which outperforms state-of-the-arts substantially; ii) exhibits prominent model efficiency, and obtains 7-40x speedups up over state-of-the-arts learning-based methods; and iii) shows high model scalability and robustness that enables large-scale classification analytics.
translated by 谷歌翻译
Natural language interaction is a promising direction for democratizing 3D shape design. However, existing methods for text-driven 3D shape editing face challenges in producing decoupled, local edits to 3D shapes. We address this problem by learning disentangled latent representations that ground language in 3D geometry. To this end, we propose a complementary tool set including a novel network architecture, a disentanglement loss, and a new editing procedure. Additionally, to measure edit locality, we define a new metric that we call part-wise edit precision. We show that our method outperforms existing SOTA methods by 20% in terms of edit locality, and up to 6.6% in terms of language reference resolution accuracy. Our work suggests that by solely disentangling language representations, downstream 3D shape editing can become more local to relevant parts, even if the model was never given explicit part-based supervision.
translated by 谷歌翻译